Skip to content

Conversation

gitgud5000
Copy link
Contributor

@gitgud5000 gitgud5000 commented May 31, 2025

TODO(deepyaman): Move notes on credentials support to a new PR from @gitgud5000 with those changes, once it exists. I'm not deleting the detailed notes from here until they're moved elsewhere.

feat(datasets): ibis.TableDataset add mode and credentials support

  • feat: Introduces a mode parameter for save operations, allowing "append", "overwrite", "error", and "ignore" options to control write behavior.
    • Supports legacy overwrite flag with backward compatibility and deprecation warning.
    • Adds mode dispatching logic to handle different write scenarios.
    • Prevents simultaneous use of both mode and legacy overwrite to avoid ambiguity.
    • Updates examples and docs to reflect new parameter.
  • feat: Adds a credentials parameter to accept connection info (dict or string URI), superseding connection.
    • Raises a warning if both credentials and deprecated connection are provided.
    • Adds backend extraction helper and adjusts _describe() to use it.
    • Improves documentation to reflect the preferred usage.

Semi-breaking change: overwrite and connection are deprecated; users should migrate to mode and credentials.

Description

This PR introduces two key enhancements to ibis.TableDataset:

  1. Configurable Save Modes:

    • Adds a mode parameter to save_args similar to Spark’s DataFrameWriter.mode and Pandas’ to_csv(mode=...), with support for:
      • "append": Insert data into an existing table (requires the backend to implement insert()).
      • "overwrite": Drop and recreate the table/view.
      • "error" or "errorifexists": Fail if the table already exists.
      • "ignore": Do nothing if the table exists; otherwise, create it.
    • Backward compatible: legacy overwrite=True|False maps to mode="overwrite"|"error".
    • Raises an error if both mode and overwrite are specified simultaneously.
  2. credentials Parameter:

    • Introduces credentials as the preferred method for specifying Ibis backend connection configurations.
    • Accepts:
      • A string connection URI.
      • A dictionary of parameters (with optional con string).
    • Supersedes the older connection parameter.
    • Warns if both are provided.
    • Updates both _connect() and _describe() to support new functionality.

Development notes

  • Closes Support for inserting data in ibis.TableDataset  #834 (requested by @deepyaman on Sep 14, 2024, to add “insert” support via a mode argument).
    • In the parent issue Enhance current integration between Kedro and Ibis #1174, the main ask was to allow an “append”/“insert” operation for Ibis datasets.
    • This implementation covers:
      • Rework the API akin to Spark’s DataFrameWriter.mode and Pandas’s to_csv(mode=…).
      • Backward compatibility with overwrite behavior.
      • Clear, explicit behavior for each mode option.
      • Support for passing credentials in different formats supported by ibis.connect()
Expand for details:

Save Mode Handling

  • DEFAULT_SAVE_ARGS updated to use mode="overwrite" by default.
  • __init__:
    • Validates mode/overwrite presence.
    • Maps overwritemode internally.
  • save() dispatches based on mode:
    • append → calls insert() if supported, else raises NotImplementedError.
    • overwrite, error, ignore → handled via create_table/create_view with appropriate overwrite flag.
  • _describe() includes mode.

Credentials Support

  • Added credentials param to init signature and docstring.
  • Accepts either:
    • A connection string (e.g. "duckdb:///my.db")
    • A dict with connection params or a con string.
  • _connect() routes accordingly based on type.
  • _get_backend_name() extracts backend info for _describe().

All supported mode options were tested using both DuckDB and Postgres backends.
credentials parsing was tested for string, dict-with-con, and dict-with-backend formats.

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Updated jsonschema/kedro-catalog-X.XX.json if necessary
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes
  • Received approvals from at least half of the TSC (required for adding a new, non-experimental dataset)

…ble materialization

- feat: Introduces a `mode` parameter for save operations, allowing "append", "overwrite", "error", and "ignore" options to control write behavior.
  - Supports legacy `overwrite` flag with backward compatibility and deprecation warning.
  - Adds mode dispatching logic to handle different write scenarios.
  - Updates examples and docs to reflect new parameter.
  - Prevents simultaneous use of both `mode` and legacy `overwrite` to avoid ambiguity.
  - docs: Improves documentation and usage examples for new save modes.

Semi-breaking change: Replaces `overwrite` save argument with `mode`; users should update configurations to use `mode`.

Signed-off-by: gitgud5000 <[email protected]>
@gitgud5000 gitgud5000 force-pushed the ibis-dataset-savemode-support branch from 2e9e539 to a41bb19 Compare May 31, 2025 01:41
@gitgud5000 gitgud5000 force-pushed the ibis-dataset-savemode-support branch from a41bb19 to 1f609c5 Compare May 31, 2025 01:44
Copy link
Member

@deepyaman deepyaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Did a quick-pass review. I see you're still actively updating, so I'll revisit this later.


self._materialized = self._save_args.pop("materialized")

# Handle mode / overwrite conflict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if insert + view needs to be disallowed; I haven't actually checked yet.

@gitgud5000
Copy link
Contributor Author

gitgud5000 commented May 31, 2025

I've been activily working with this because I need the features now 😅.

I've also added support for passing credentials to the ibis.TableDataset to this PR I'll update the OP shortly to reflect this.
Is that ok or should I create a separate PR for this feature? @deepyaman

@gitgud5000 gitgud5000 changed the title feat(datasets): ibis.TableDataset add configurable save mode for table materialization feat(datasets): ibis.TableDataset add configurable save mode for table materialization and credentials support May 31, 2025
…precate connection

- feat: introduce a new `credentials` parameter to accept connection info as a string or dict, preferred over the deprecated `connection` param
  - warn if both `credentials` and `connection` are provided, prioritizing `credentials`
  - support connection strings and dicts with connection strings for backend connections
- feat: add backend name extraction to be used in `_describe()` method
- docs: update docstrings to explain `credentials` and deprecation of `connection`

!Semi- Breaking Change: deprecates the `connection` parameter in favor of `credentials`

Signed-off-by: gitgud5000 <[email protected]>
…lacing the function dispatch dict with direct conditional handling, improving clarity and maintainability

- docs: expands and clarifies docstring to explain available save modes and their effects, referencing Spark semantics for familiarity

Signed-off-by: gitgud5000 <[email protected]>
@gitgud5000 gitgud5000 force-pushed the ibis-dataset-savemode-support branch from d545871 to 6445239 Compare May 31, 2025 03:52
@deepyaman
Copy link
Member

deepyaman commented May 31, 2025

I've been activily working with this because I need the features now 😅.

Best reason. 😁

I've also added support for passing credentials to the ibis.TableDataset to this PR I'll update the OP shortly to reflect this. Is that ok or should I create a separate PR for this feature? @deepyaman

Sorry, I didn't check in time; ideal is a separate PR, but it's probably no big deal to have it in the same. If it ends up being a blocker to get something merged, can always split it out later.

@ankatiyar ankatiyar requested a review from deepyaman June 3, 2025 14:52
@gitgud5000 gitgud5000 marked this pull request as draft June 4, 2025 11:13
@gitgud5000 gitgud5000 marked this pull request as ready for review August 15, 2025 18:49
- fix: add early return for empty DataFrame
- fix: ensure table creation occurs before insert operations in append mode when table doesn't exist

Signed-off-by: gitgud5000 <[email protected]>
@gitgud5000 gitgud5000 force-pushed the ibis-dataset-savemode-support branch from 40bbfe3 to 6130581 Compare August 15, 2025 20:15
@gitgud5000
Copy link
Contributor Author

Could you please take a look at this pull request, @ankatiyar @ravi-kumar-pilla?

@ankatiyar
Copy link
Contributor

@gitgud5000 I'll take a look, in the meanwhile could you add some unit tests to go along with the changes? :D

- fix: remove lingering 'mode' key when mapping legacy overwrite to prevent unexpected writer kwargs
- fix: treat empty pandas DataFrame as a no-op in save, supporting both pandas and ibis tables

Prevents accidental parameter leakage and avoids errors when saving empty data.

Signed-off-by: gitgud5000 <[email protected]>
…and legacy overwrite behavior

Signed-off-by: gitgud5000 <[email protected]>
@gitgud5000 gitgud5000 force-pushed the ibis-dataset-savemode-support branch from db200f3 to 55b8666 Compare August 18, 2025 14:31
@gitgud5000
Copy link
Contributor Author

gitgud5000 commented Aug 18, 2025

@ankatiyar Done ✅

I wasn’t able to resolve a couple of things:

  • All the tests I added pass locally, except for test_connection_config with the mssql config.
=========================== short test summary info ============================
FAILED kedro-datasets/tests/ibis/test_table_dataset.py::TestTableDataset::test_connection_config[None-None-None-connection_config1-key1]
  • The lint check pipeline is still failing, and I’m not sure why or how to fix it. Would appreciate your guidance 🙂

Signed-off-by: Ankita Katiyar <[email protected]>
@gitgud5000 gitgud5000 force-pushed the ibis-dataset-savemode-support branch from 63b95c1 to 4c6ddb9 Compare August 18, 2025 16:38
@deepyaman deepyaman force-pushed the ibis-dataset-savemode-support branch from f819670 to cda534a Compare September 23, 2025 23:30
@deepyaman deepyaman force-pushed the ibis-dataset-savemode-support branch from 1057b14 to 513aab0 Compare September 23, 2025 23:44
@deepyaman deepyaman changed the title feat(datasets): ibis.TableDataset add configurable save mode for table materialization and credentials support feat(datasets): make table write mode configurable Sep 23, 2025
@deepyaman deepyaman force-pushed the ibis-dataset-savemode-support branch from 62d976a to 55ba8a0 Compare September 24, 2025 19:29
@ankatiyar
Copy link
Contributor

The code looks good to me! I'm wondering if we need to support overwrite and mode both and then remove overwrite later instead of just supporting mode and doing a major release for kedro-datasets? There's some new datasets in the works (though experimental) that could warrant a breaking release? (cc @rashidakanchwala)

@rashidakanchwala
Copy link
Contributor

yes, the next release will likely be a major one (given the new GenAI datasets), so doing the breaking change now could make sense.

That said, I’m not sure how widely the Ibis dataset is used or whether existing users would expect backward compatibility / a deprecation phase before overwrite is removed. If usage is low, it might be fine to go straight to mode and make it part of the next major.

@deepyaman
Copy link
Member

deepyaman commented Oct 8, 2025

That said, I’m not sure how widely the Ibis dataset is used or whether existing users would expect backward compatibility / a deprecation phase before overwrite is removed. If usage is low, it might be fine to go straight to mode and make it part of the next major.

I believe there are a fair number of users, and I'd rather keep the deprecation warning given that (1) it's already implemented and (2) it's not some complicated functionality to maintain, and there are no significant tradeoffs/downsides to having the functionality.

yes, the next release will likely be a major one (given the new GenAI datasets), so doing the breaking change now could make sense.

What is the timeline for the next release? @gitgud5000 also implemented credentials support as part of this PR that we were split out/are moving to a follow up. I assume that would also be good to get in for the next release.

@deepyaman deepyaman force-pushed the ibis-dataset-savemode-support branch from 586c24c to 6912ec6 Compare October 12, 2025 20:40
@deepyaman deepyaman requested a review from ankatiyar October 12, 2025 20:41
@deepyaman deepyaman enabled auto-merge (squash) October 12, 2025 22:40
Copy link
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @gitgud5000 and @deepyaman 💯

@deepyaman deepyaman merged commit 170cc68 into kedro-org:main Oct 13, 2025
13 of 14 checks passed
gitgud5000 added a commit to gitgud5000/kedro-plugins that referenced this pull request Oct 19, 2025
…de argument (PR kedro-org#1093) and credentials support in the upcoming changes section.

Signed-off-by: gitgud5000 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for inserting data in ibis.TableDataset

4 participants